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1  Introduction 

At  the  heart  of  many  visual  processing  tasks  lies  the  determination  of  correspon¬ 
dence  between  features  in  two  images.  For  example,  correspondence  is  at  least 
implicitly  required  in  stereopsis,  structure  from  motion,  and  object  recognition. 
Often  the  matching  is  underconstrained  because  the  features  can  be  matched  in 
more  than  one  way.  Furthermore,  the  search  space  for  possible  matches  may  be 
prohibitively  large.  Assumptions,  usually  derived  from  regularities  of  the  physi¬ 
cal  world,  are  therefore  necessary  to  constrain  matching  solutions  uniquely.  For 
example,  in  stereopsis  one  can  constrain  the  search  to  small  areas  along  an  epipo- 
lar  line  by  exploiting  uniqueness,  continuity,  and  probabilistic  assumptions  about 
disparity  [24,  26,  33]. 

We  address  the  problem  of  matching  contours  in  a  pair  of  images,  assuming 
partial  constraints  on  the  matching  are  available.  As  illustrated  by  Figure  1,  we 
are  given:  (1)  an  image  containing  a  set  of  contours;  (2)  a  second  image  obtained 
by  displacing,  stretching,  rotating,  scaling,  and  distorting  these  contours;  and,  (3) 
matching  constraint  lines  for  several  points  along  the  first  set  of  contours  con¬ 
straining  the  match  for  each  of  these  points  to  a  line  in  the  second  image.  The 
contour  images  may,  for  instance,  be  obtiuned  by  applying  an  edge  finding  algo¬ 
rithm  to  either  sampled  time-varying  imagery  or  two  different  views  of  a  scene. 
The  constraint  lines  accompanying  these  two  images  can  often  be  estimated  for 
many  matching  applications.  One  such  application  is  the  measurement  of  visual 
motion.  In  determining  optical  flow  along  isobrightness  contours,  the  tangential 
component  of  motion  frequently  must  be  constrained  by  additional  assumptions 
due  to  the  aperture  problem  [25,  10,  17,  1,  14).  Local  measurements  of  motion 
in  regions  without  trackable  features  (such  as  edge  corners)  can  capture  the  com¬ 
ponent  of  image  velocity  only  in  the  direction  normal  to  isobrightness  contours. 
Therefore,  the  match  for  a  point  in  one  image  fr2une  is  often  constrained  at  best 
to  a  line  in  the  next  image  frame. 

Another  problem  requiring  the  matching  of  contours  arises  in  alignment-based 
methods  for  object  recognition  (for  an  overview  of  alignment  approaches,  see  [22, 
9,  40, 18,  5]).  In  these  methods,  global  transformations  between  model  and  object 
views  are  compensated  for  by  a  normalization  stage,  which  aligns  the  two  views 
and  allows  subsequent  comparison.  These  views  can  be  represented  by  contours. 
Alignment  schemes  require  that  object  and  model  features  be  matched  at  some 
stage.  Assuming  that  model  and  object  views  can  be  roughly  aligned  using  global 
image  properties  or  several  pre-matched  “anchor  points”  [18],  constraint  lines  can 
often  be  estimated  for  each  object  contour  point  by  the  tangent  at  the  closest  model 
contour  point  [4].  (The  main  inaccuracies  in  this  method  will  occur  for  constraint 
lines  at  high  curvature  points,  which  are  simply  ignored  in  the  matching  process.) 
A  similar  method  can  be  used  in  finding  correspondence  for  apparent  motion. 

This  paper  describes  a  new  scheme  for  solving  these  contour  matching  problems. 


1 


Figure  1:  To  find  a  point-to-point  correspondence  between  contours  in  two  images 
we  are  given  constraint  lines  for  several  points  in  the  first  image.  Each  constraint 
line  narrows  down  the  match  for  a  point  in  the  first  image  to  a  line  in  the  second. 
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First,  we  outline  several  previous  methods  for  matching  contours  in  the  contexts  of 
both  recovering  optical  flow  and  finding  matches  for  object  recognition.  Second,  we 
describe  the  new  scheme  which  uses  local  affine  transformations  to  predict  matches 
between  contours.  Next,  we  briefly  review  some  motivating  data  concerning  the 
computation  of  pattern  motion  in  the  primate  visual  system.  Finally,  we  present 
simulation  results  obtained  on  noisy  synthetic  auid  natural  imagery. 

2  Previous  Matching  Ideas 

In  computing  optical  flow,  several  methods  have  assumed  that  the  motion  of  pat¬ 
terns  can  be  described  at  least  locally  by  pure  translation  [21,  32, 1].  This  assump¬ 
tion  can  be  used  to  track  objects  in  the  case  of  simple  observer  motion,  but  a  more 
general  assumption  is  necessary  to  account  for  objects  undergoing  combinations  of 
translation  with  three-dimensional  rotation,  scaling,  and  deformation.  A  common 
approach  to  the  general  motion  problem  is  to  find  the  velocity  field  with  the  least 
variation  that  is  consistent  with  local  motion  measurements  [17,  27,  2,  11,  19]. 
This  idea  is  based  on  the  observation  that  objects  are  locally  rigid.  Such  meth¬ 
ods  have  been  applied  either  in  a  region-based  manner  [10,  16],  or  along  image 
contours  [12,  13,  28].  These  schemes  usually  require  slowly  converging  iterative 
methods,  such  as  the  conjugate  gradient  method,  to  efficiently  recover  a  global 
solution  constrained  by  hundreds  of  linear  equations  [13].  They  ako  suffer  from 
stability  problems  at  or  near  the  degenerate  case  of  translating  linear  contours. 

A  number  of  methods  have  assumed  that  objects  in  motion  can  be  described 
by  planar  patches,  an  idea  originally  developed  for  the  recovery  of  structure  from 
motion  [43].  This  idea  is  the  basis  for  the  "velocity  functional  method”  of  Waxman 
and  Wohn  [44].  Their  method  determines  the  second  order  terms  of  the  Taylor 
series  expansion  of  optical  flow  by  satisfying  the  normal  flow  constraints  within 
fixed  image  neighborhoods.  In  each  neighborhood  a  residual  indicating  "goodness 
of  fit”  is  used  to  segment  the  motion  field  into  analytical  regions.  Because  the  size  of 
these  neighborhoods  remains  fixed,  however,  highly  non-planar  segment  boundaries 
cannot  be  accurately  described.  The  method  also  has  difficulties  with  matching 
contours  that  are  perspective  projections  of  less  than  biquartic  complexity  (such 
as  ellipses,  parabolas,  etc).  This  singular  situation,  termed  "the  aperture  problem 
in  the  large”,  primarily  arises  when  the  normal  flow  in  the  fixed  neighborhoods 
does  not  adequately  constrain  the  series  coefficients.  As  with  smoothness,  the 
solution  becomes  unstable  (sensitive  to  noise)  as  objects  approach  these  singular 
configurations. 

Other  methods  that  exploit  local  planarity  include  afline  motion  methods  that 
select  local  neighborhoods  using  iterative  successive  refinement  [7,  6].  At  a  given 
spatial  scale  a  low  pass  filter  with  appropriate  bandwidth  is  applied  to  the  residual 
motion  field.  Then,  assuming  the  motion  field  varies  smoothly,  the  residual  motion 
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field  obtained  at  large  spatial  scales  influences  the  calculation  of  the  motion  field  at 
smaller  scales.  Although  convergence  is  often  quite  rapid  for  nearly  planar  surfaces 
in  motion,  many  iterations  are  required  for  highly  non-planar  surfaces  undergoing 
large  motions.  No  attempt  is  made  to  shape  neighborhoods  so  that  normal  flow  is 
integrated  primarily  along  connected  contours.  Finally,  singular  conditions  similar 
to  those  affecting  the  Taylor  series  approximation  methods  again  leave  even  the 
largest  spatial  scales  unstable. 

In  object  recognition,  several  contour  matching  schemes  employ  correlation 
measures  or  examine  angles  and  intersections  between  contours.  Such  methods 
either  assume  perfect  alignment,  or  conduct  a  complete  search  through  feature 
space.  They  cannot  deal  with  un-parameterized  distortions,  or  inaccuracies  preva¬ 
lent  in  contour  images.  They  also  typically  lack  efficient  implementation.  Recently, 
a  number  of  methods  have  been  devised  for  comparing  contours  using  affine  in¬ 
variant  curvature  [8],  arc  length  [3],  or  moments  [15].  Such  methods  parameterize 
contours  according  to  one  of  these  measures,  normalize  the  contours  by  shifting 
their  representation  in  parameter  space,  and  correlate  the  resulting  invariant  rep¬ 
resentations.  The  main  drawback  of  these  methods  is  that  they  are  global  mea¬ 
sures,  applicable  only  to  complete  curves.  As  a  result,  these  methods  cannot  deal 
with  noise  or  occlusion.  Furthermore,  as  the  objects  producing  the  contour  image 
becomes  less  planar,  the  affine  invariance  becomes  less  valid  for  describing  trans¬ 
formations.  Another  disadvantage  is  that  the  calculation  of  these  affine  invariant 
quantities  requires  a  high  degree  of  differentiation  along  contours,  implying  both 
that  contour  tracing,  contour  thinning,  and  contour  enhancement  be  performed  in 
order  to  ensure  a  unique  path  along  the  contour,  and  that  contom  smoothing  be 
pterformed  in  order  to  avoid  large  errors  in  the  differentiation  due  to  noise.  Finally, 
these  recognition  techniques  are  intended  as  verification  steps  taken  after  a  match¬ 
ing  has  already  been  hypothesized  by  exponential  search  or  inaccurate  heuristics. 
This  is  also  true  of  other  affine  matching  techniques  [20,  42]. 

3  The  Proposed  Solution 

In  the  scheme  described  below  the  transformations  of  contours  within  local  image 
neighborhoods  are  assumed  to  be  affine.  We  implicitly  find  the  affine  transforma¬ 
tion  through  a  least  squares  fit  to  available  match  constraint  lines  and  pre-matched 
points.  The  affine  transformation  is  established  between  local  contour  segments, 
rather  than  complete  curves  (as  in  [3]),  based  on  the  displacement  field  within  a 
small  patch  (as  in  [44]).  It  constrains  the  matching  by  assuming  that  contours  are 
the  orthographic  projections  of  locally  coplanar  points,  thereby  reducing  the  re¬ 
covery  of  correspondence  to  a  local,  linearly  constrained,  non-iterative  calculation. 

Local  neighborhoods  are  constructed  so  that  constraint  line  information  is 
smoothly  integrated  primarily  over  proximal  connected  contours.  This  is  accom- 
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pushed  by  using  elUptical  Gaussian  neighborhoods  oriented  along  the  contour.  To 
choose  the  appropriate  size  of  the  neighborhoods,  we  consider  several  neighbor¬ 
hoods  of  differing  sizes  simultaneously  for  a  given  point.  From  these  neighbor¬ 
hoods,  we  choose  the  largest  neighborhood  within  which  an  affine  transformation 
can  accurately  satisfy  the  available  constraints.  A  stable,  unique  solution  is  guar¬ 
anteed  for  the  chosen  neighborhood  by  using  a  modified  pseudoinverse  method 
to  find,  subject  to  the  constraints,  the  matches  that  deviate  the  least  from  the 
minimal  purely  translational  correspondence. 

The  following  sections  detail  each  facet  of  the  scheme  in  an  increment2d  fashion. 
We  start  with  the  matching  of  contours  that  are  orthographic  projections  of  rigid 
planar  surfaces. 

4  Matching  Globally  Planar  Contours 

Let  us  first  assume  that  the  transformation  between  two  image  contours  can  be 
described  by  a  global  affine  mapping.  That  is,  each  contour  point  =  (x<,yi)  in 
the  first  image  maps  to  contour  point  p\  =  (xj,  y,-)  in  the  second  image  by  the 


linear  equation 


P'i  =  ^Pi+t  (1) 

where  the  2x2  matrix  A  accounts  for  the  two-dimensional  shearing,  scaling,  and 
rotation  in  the  image  plane  about  the  global  origin,  po  =  (0,0),  and  the  vector  t 
accounts  for  two-dimensional  translation  in  the  image  plane.  Next,  consider  the 
contours  in  Figure  2.  Suppose  that  the  exact  location  of  P'.  is  unknown,  but  that 
it  lies  along  some  known  constraint  line  in  the  image.  The  perpendicular  distance 
between  the  match  point  p|-  and  the  constraint  line  should  be  zero.  Therefore,  if 
Hi  is  the  perpendicular  from  pi  to  the  constraint  line,  and  hi  is  the  unit  normal  to 
the  constraint  line,  then  the  constraint  line  is  described  by 

(p-  +  n,  -  p',)^n<  =  0.  (2) 

Substituting  (1)  into  (2)  yields 

(Api)^n.-  -I-  t^n.-  =  pf  n.  +  jn,].  (3) 


Since  the  right  hand  side  of  this  equation  can  be  computed  from  available  infor¬ 
mation,  we  obtain  one  linear  equation  constraining  the  six  parameters  of  the  affine 
transformation  for  each  point.  If  we  represent  the  six  affine  parameters  as  a  vector 


=  [  A(so 

■jT 

Aoi  y4io  All  lx  j  7 

(4) 

then  (3)  can 

be  rewritten  as 

c,a  =di 

(5) 

where 

hi,yi  hi^Xi  hi^pi  hi^  hi^  ] 

(6) 

d,  =  pjhi  +  lni|. 

(7) 

(Note  that  c,  =  0  and  dj  =  0  if  either  no  constraint  information  is  available  for 
point  Pi,  or  pi  does  not  lie  on  a  contour.)  Thus,  for  a  system  of  k  points  and 
associated  equations  (where  k  is  the  number  of  points  in  the  image). 


Ca  =  d  (8) 

where  C  is  a  ^  x  6  matrix  with  rows  Cj . . .  Ck,  and  d  is  a  vector  with  elements 
dj . . .  dk-  When  there  are  more  than  six  independent  equations,  (8)  should  be 
solved  in  the  least  squares  sense.  Specifically,  we  solve  the  system 

C^Ca  =  C^d  (9) 

which  minimizes  the  mean  squared  distance  between  each  match  and  its  constraint 
line.  This  equation  predicts  matches  for  all  points,  even  points  without  constraint 
lines,  provided  that  an  affine  transformation  is  uniquely  determined  by  existing 
constraint  lines  in  the  image.  (Later,  we  extend  this  prediction  to  the  underdeter¬ 
mined  case.) 
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5  Incorporating  Matched  Feature  Points 


In  some  matching  problems,  the  exact  point-to-point  correspondence  may  be  known 
for  some  special  feature  points,  such  as  corners,  terminators,  high  curvature  points, 
inflection  points,  and  isolated  points.  To  incorporate  the  influence  of  a  feature 
point  match  between  pi  and  p',  we  once  again  assume  the  match  is  given  by  equa¬ 
tion  (1)  and  minimize  the  distance, 

l(Api-|-t)-p'J.  (10) 

Hence,  the  two  constraint  equations  for  each  feature  point  p,  are  given  by 

Fi^  =  p'i  (11) 


where 


Xi  ffi  0  0  10 

0  0  X,  y,  0  1 


(12) 


(Once  again,  Fi  =  0  and  p|  =  0  when  pi  has  no  given  feature  point  match.)  Note 
that  these  two  equations  describe  two  perpendicular  lines  that  intersect  at  p',-. 
Feature  matches  can  therefore  be  treated  as  a  special  type  of  contour  point  that 
contributes  two  constraint  lines  instead  of  one.  Again,  for  a  system  of  k  feature 
points,  we  have 

Fa  =  g  (13) 

where  F  is  a  2A:  x  6  matrix  consisting  of  rows  Fi . . .  F*,  and  g  is  the  length  2jb 
concatenation  of  p'l . . .  p\.  Combining  (13)  with  (8)  yields 


■(1-q)C‘ 

■(l-o)d' 

aF 

A  = 

.  O'®  . 

(14) 


where  a,  a  number  between  0  and  1,  is  the  accuracy  of  feature  point  matches 
relative  to  the  accuracy  of  contour  point  constraint  lines.  Finally,  finding  the  best 
affine  transformation  amounts  to  solving  the  least  squares  relation 

((1  -  a)C^C  +  qF^F)  a  =  (1  -  a)C^d  oF^g,  (15) 


obtained  by  differentiating  (14)  with  respect  to  a  and  setting  i,he  result  to  zero. 
Again,  solving  this  system  of  equations  for  a  determines  the  best  global  affine 
transformation  about  the  global  origin,  po. 


6  Matching  Locally  Planar  Contours 

Since  contours  are  generally  perspective  projections  of  non-planar,  non-rigid  ob¬ 
jects,  (1)  cannot  in  general  accurately  describe  matches  globally.  The  scheme 
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therefore  enforces  this  affine  transformation  assumption  only  locally.  That  is,  the 
constraint  information  is  integrated  within  a  local  neighborhood  around  each  con¬ 
tour  point,  pj.  This  local  neighborhood  is  constructed  by  weighting  the  constraints 
at  each  point. 

To  fulfill  this  neighborhood  requirement,  it  is  convenient  to  describe  the  com¬ 
putations  using  a  local  coordinate  system  at  each  contour  point,  py.  The  location 
of  each  point,  pi,  is  measured  with  respect  to  the  local  origin,  Pj,  instead  of  the 
global  origin,  Po.  To  make  the  calculations  local  we  also  weigh  constraints  for 
each  point,  Pi,  by  some  locality  measure,  u;,.  The  “closer”  Pi  to  pj,  the  larger 
this  weight.  Employing  the  method  of  weighted  least  squares  to  incorporate  this 
weighting  scheme  into  (15),  we  find  the  local  affine  transformation,  and  hence  the 
match,  for  each  contour  point,  pj,  by  satisfying  a  system  of  equations 


ftasrl  (16) 

where 

R  =  il-a)C'^W^C  +  aF'^W^F  (17) 

1  =  (1  -  Q)C'^W^d  +  aF'^W^g.  (18) 

The  diagonal  matrix  W  establishes  the  local  neighborhood  at  pj  and  is  given  by 

H^  =  diag(a?i...w*).  (19) 

Note  that  the  resulting  6x6  matrix,  i2,  and  the  six-dimensional  vector,  1,  can 
both  be  written  explicitly  as  sununations  of  point  locations,  normads,  and  weights 
by  expanding  the  matrix  definitions.  Therefore,  the  elements  of  these  matrices  can 
easily  be  calculated  in  parallel  for  each  local  neighborhood  using  simple  adders. 
In  the  following  sections  we  consider  appropriate  ways  to  shape  and  scale  this 
neighborhood,  thereby  defining  W. 

7  Neighborhood  Shape: 

Oriented  Elliptical  N  ighbor hoods 

The  weight  for  each  point  determines  the  relative  influence  of  its  constraints  upon 
the  solution.  The  set  of  all  point  weights  therefore  determines  the  extent  and 
shape  of  the  local  neighborhood.  We  determine  the  weights  according  to  several 
neighborhood  criteria.  First,  the  neighborhood  integration  should  monotonically 
decrease  with  distance  from  the  local  origin,  since  distant  points  are  less  likely  to  be 
coplanar  with  p^.  Second,  the  neighborhood  should  be  smooth  so  that  matching 
solutions  vary  continuously  along  contours.  (Note,  however,  that  this  will  not 
mandate  the  smoothest  solution  along  the  contour.)  Finally,  the  neighborhood 
should  be  maximally  elongated  and  oriented  along  the  contour  in  order  to  integrate 
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Figure  3:  (a)  Integration  along  nearly  linear  contour  segments  is  primarily  along 
the  contour,  while  (h)  integration  over  symmetric  sections  of  the  contour  is  circular. 
Oriented  local  neighborhoods  are  determined  by  the  product  of^i,  a  Gaussian  of 
the  distance  from  the  local  origin  pj,  and  m,  a  Gaussian  of  the  distance  from  the 
axis  of  local  orientation  passing  through  py.  The  width  of  the  latter  Gaussian  is 
determined  by  the  strength  of  the  local  orientation. 


information  primarily  along  connected  proximal  contours  without  serial  contour 
tracing. 

With  these  criteria  in  mind,  we  suggest  that  the  weight  for  p,  actually  be  the 
product  of  two  Gaussian  weights; 


Wi  =  r/ipi. 


(20) 


The  first  weight,  7,,  is  given  by 

7i  =  exp  (~|pi|V2<7’i„)-  (21) 

The  set  of  all  7’s  constructs  a  circularly  symmetric  Gaussian  neighborhood  about 
the  local  origin.  The  standard  deviation  of  this  Gaussian,  (T,ize,  is  the  effective  size 
of  this  neighborhood.  This  parameter  is  discussed  in  the  next  section.  The  second 
weight,  u,,  is  given  by 

(22) 
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where  is  the  perpendicular  distance  from  pi  to  the  axis  of  local  orientation  pass¬ 
ing  through  pj.  It  modulates  the  circularly  symmetric  Gaussian  and  orients  the 
resulting  elliptical  neighborhood  to  integrate  information  primarily  over  connected 
contours  without  explicitly  performing  connectivity  analysis  or  contour  tracing. 
The  width  of  this  Gaussian,  Othape,  ranges  between  0  and  oo,  corresponding  to 
maximal  local  orientation  and  circular  symmetry  respectively.  Hence,  the  stronger 
the  local  orientational  preference  of  the  contour,  the  higher  the  aspect  ratio  of  the 
elliptical  neighborhood,  and  the  larger  the  relative  influence  of  points  that  lie  closer 
to  the  local  axis  of  orientation.  This  neighborhood  construction  is  illustrated  in 
Figure  3. 

We  now  determine  (assuming  first  that  we  know  ante)  keeping  in  mind 

that  the  oriented  neighborhood  must  be  narrow  for  linear  contours  and  wide  for 
circularly  symmetric  contours.  To  capture  this  notion,  we  propose  that  the  major 
and  minor  axes  of  the  elliptical  neighborhood  be  respectively  aligned  with  and 
proportional  (by  some  proportionality  constant  0)  to  the  major  and  minor  axes  of 
the  principal  component  ellipse  of  the  local  binary  contour  image.  The  principal 
component  ellipse  can  be  derived  through  principal  component  analysis  of  the 
inertia  matrix 


J  = 


'xy 
hv  J 


where 


i=\ 


Jxy  ~  ^  ^  li  Jyy 


«*1 


1=1 


(23) 


(24) 


are  the  second  local  moments  about  the  local  y,  — zy,  and  z  axes  respectively, 
and  6,  is  the  value  of  the  binary  contour  image  at  pj.  First,  the  second  moment 
extremes  are  the  eigenvalues  of  J, 

Jxx  "I"  Jyy  P  \  _  «^ix  ~i~  Jyy  P 


— 


^min  — 


(25) 


where 


P  =  yJiJxx  -  Jyy)^  +  (2Jxv)'-  (26) 

These  eigenvalues  give  the  axes  magnitudes  of  the  principal  component  ellipse, 
to  which  the  neighborhood  axes  should  be  proportional.  If  the  major  axis  of  the 
neighborhood  is  <7,,,*  =  0^max,  then  the  minor  axis  is  simply  /3Am,„  = 

This  desired  minor  axis  is  the  effective  width  of  the  modulated  Gaussian  in  the 
direction  perpendicular  to  the  axis  of  orientation;  that  is, 

Amin 


-2  ,  -2 
tize  ’’  "  zhape 


)-i. 


Rearranging,  we  obtain 


^ shape  — 


JK 


(27) 


(28) 
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Next,  Si  is  given  by  the  scalar  product  of  p,  and  the  unit  normal  to  the  axis 
of  local  orientation,  defined  by  the  unit  eigenvector  of  J  corresponding  to  the 
eigenvalue  A,„ax: 

^  \yj P  "b  sign(  Jxv)(*^vi/  ~  Jxx)i  ~  sign(Jxv)^P  "I"  sign(  Jxy)(«^ir  ~  *^y»)  mtw 

ex  =  - .  (29) 

Hence, 

Si  =  ^Ipi  (30) 

(see  [16]  for  a  more  complete  analysis  of  using  the  principal  component  method  to 
find  the  orientational  preference  of  binary  images). 

Note  that,  although  there  may  be  problons  with  finding  the  orientation  of  a 
contour  that  is  surrounded  by  several  other  contours,  surrounding  contours  should 
not  in  general  affect  the  Anal  solution  long  as  we  have  chosen  a  sufficiently  small 
neighborhood  in  which  to  carry  out  the  calculations.  This  issue  is  addressed  next. 


8  Neighborhood  Size: 

Choosing  From  Multiple  Neighborhood  Scales 


For  several  reasons  it  is  desirable  to  include  in  the  computation  a  mechanism  for 
selecting  the  size  of  each  integration  neighborhood.  As  the  size  of  the  local  neigh¬ 
borhood  decreases,  the  local  planarity  assumption  is  better  approximated,  and  the 
possible  interference  from  nearby  contours  decreases.  However,  if  the  neighbor¬ 
hood  is  too  small,  lack  of  adequate  constraints  on  the  local  affine  transformation 
produces  an  unstable  solution.  Clearly  there  is  a  tradeoff  between  extracting  suf¬ 
ficient  information  for  unique  solutions  within  large  neighborhoods,  and  satisfying 
local  planarity  assumptions  and  insulating  computations  from  interfering  contours 
with  small  neighborhoods. 

To  solve  this  problem  we  use  a  number  of  spatial  scales  computed  in  paral¬ 
lel.  Out  of  several  possible  solutions  obtiuned  for  different  neighborhoods  (several 
values  of  (Tgize)  around  Pj,  the  solution  corresponding  to  the  largest  spatial  scaile 
that  best  satisfies  (16)  prevails.  There  are  several  possible  ways  to  determine  how 
well  a  given  spatial  scale  uniquely  satisfies  the  constraints.  For  instance,  one  could 
use  the  least  squares  residual  to  determine  how  well  the  assumptions  fit  the  local 
neighborhood.  Using  this  method,  we  solve  equation  (16)  at  several  scales  and 
choose  the  largest  scale  for  which  the  least  squares  residual  is  small.  Alternatively, 
one  could  use  the  condition  number  of  the  matrix  R  to  determine  if  a  given  local 
scale  is  too  small.  Under  this  criterion  we  take  the  solution  at  the  smallest  spa¬ 
tial  scale  for  which  the  condition  number  of  R  is  lower  than  some  number, 
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Finally,  we  could  choose  the  spatial  scale  based  on  some  combination  of  these  two 
methods. 

The  condition  number  criterion  is  axlopted  here  for  the  sake  of  efficiency  in  serial 
simulation.  First,  this  criterion  allows  us  to  try  smaller  scales,  which  require  less 
integration  time,  before  larger  ones.  Second,  it  does  not  require  that  we  actually 
solve  (16)  at  each  scale,  as  does  the  least  squares  residual.  In  setting  the  maximum 
condition  number,  Kman  one  must  consider  the  accuracy  of  the  constraint  lines  and 
feature  points.  If  these  quantities  are  known  to  be  accurate,  then  K^ax  should  be 
high  to  promote  precise  solutions.  If,  however,  the  accuracy  is  low,  then  Kmax 
should  be  low  to  decrease  the  solution's  sensitivity  to  such  errors. 

9  Minimal  Solutions  for  Ambiguous  Cases 

Despite  selection  of  the  optimal  neighborhood,  the  solution  to  (16)  can  be  under- 
determined  or  unstable,  even  in  the  largest  neighborhood.  This  situation  occurs 
when  there  exists  more  than  one  affine  mapping  between  two  contours.  Consider, 
for  example,  a  linear  contour  where  the  constraint  lines  for  all  points  are  identical. 
In  this  case  any  combination  of  scaling  or  shearing  along  the  constraint  line,  in 
addition  to  any  translation  taking  the  line  to  the  constraint  line,  satisfies  the  con¬ 
straints.  Another  example  is  the  matching  of  concentric  circles.  If  at  each  point 
along  the  first  circle  the  constraint  line  is  parallel  to  the  tangent  to  the  circle  at 
that  point,  then  any  amount  of  rotation,  coupled  with  an  appropriate  scaling,  is 
possible  (see  examples  in  Figure  4). 

For  these  singular  cases,  we  select  the  minimal  solution.  Intuitively,  the  min¬ 
imal  solution,  chosen  from  the  set  of  solutions  which  satisfy  the  available  con¬ 
straints,  is  closest  to  the  smallest  purely  translational  matching.  This  idea  is  based 
entirely  on  intuition.  First,  it  is  conceivable  that  many  object  transformations  not 
described  by  a  unique  affine  transformation  (a  first  order  transformation)  can  be 
described  by  a  unique  translation  (a  zero-th  order  transformation).  Second,  when 
even  the  tramslational  matching  is  ambiguous,  it  seems  that  the  solution  should 
be  closest  to  the  normal  component  of  the  match  to  reflect  the  lack  of  constraints. 
These  notions  are  illustrated  in  Figure  4.  Any  amount  of  rotation  of  the  circle,  or 
any  amount  of  tangential  translation  of  the  line,  results  in  a  set  of  match  vectors 
that  deviate  more  from  their  normal  components. 

In  formulating  the  mechanism  for  choosing  the  minimal  solution  defined  above, 
we  must  retain  some  important  features.  First,  the  mechanism  must  uniquely 
(stably)  determine  matches,  regardless  of  the  quantity  or  quality  of  the  constraint 
information.  Second,  the  mechanism  must  preserve  continuous  matches  along  con¬ 
tours.  Finally,  the  minimal  solution  must  have  a  closed  form  solution  that  is 
efficiently  calculated  at  each  point  without  iteration. 
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One  way  to  satisfy  all  these  requirements  is  to  minimize  the  cost  function 

A  =  5^(u;i|Api  +  t-(p.  (31) 

i=l 

which  summarizes  over  the  entire  neighborhood  the  squared  deviation  between  pre¬ 
dicted  match  vectors  and  tmin,  the  smallest  pure  translation  that  satisfies  neighbor¬ 
hood  constraints  in  the  least  squ2ires  sense.  Even  for  collinear  points  with  parallel 
constraint  lines  (which  can  be  matched  by  many  purely  translational  transforma¬ 
tions),  A  is  uniquely  minimized  by  the  average  neighborhood  normal  component. 
Furthermore,  A  is  robust  to  noise  because  it  integrates  information  over  the  entire 
local  neighborhood,  as  opposed  to  directly  minimizing  the  deviation  between  each 
match  and  the  normal  component,  which  is  sensitive  to  noise  in  the  constraint 
line.  As  for  smoothness,  calculation  over  Gaussian  local  neighborhoods  guarantees 
continuous  variation  of  A.  Finally,  A  is  computed  independently  of  (in  parallel 
with)  the  final  solutions  at  other  points,  as  opposed  to  global  selection  methods 
which  select  the  set  of  contour  matches  to  optimize  an  overall  measure,  such  as 
smoothness  along  the  contour. 

Assuming  tmin  computed  (see  later  section),  minimizing  A  is  identi¬ 

cal  to  finding  the  affine  transformation  that  matches  feature  points  (the  feature 
matches  for  each  point  pj  simply  being  pj  -f  tmin)-  Therefore,  to  minimize  A  we 
solve  a  system  of  linear  equations,  similar  to  those  of  (13),  given  by 

Qa  *  h  (32) 

where 

Q  =  S'^W^S,  h  =  S'^W^y.  (33) 

Here  5,  similar  to  F,  is  a  2A:  x  6  dimensional  matrix  with  rows  5i . . .  5*,  where 

Xi  yi  0  0  10 
0  0  X,  y,  0  1  ’ 

and  V,  similar  to  g,  is  a  length  2k  concatenation  of  Vj . . .  v^,  where 

Vi  =  p<  -f  t,nm.  (35) 

To  minimize  A  subject  to  the  matching  constraints  we  solve 

nnn(a^Qa  —  2a^h)  subject  to  R&  =  1.  (36) 

Once  again,  t„i„  is  the  smallest  least  squares  neighborhood  pure  translation, 
which  we  have  not  yet  mathematically  defined.  In  the  next  section  we  simplify  the 
existing  constraints  in  order  to  directly  compute  this  default,  and  eventually  the 
match  vector  itself,  later  in  the  paper. 
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(t)  0>) 


Figure  4:  Minimal  sclutiona  for  ambiguous  transformations  are  intuitive:  (a)  con¬ 
centric  circles  could  be  matched  unth  any  amount  of  rotation,  but  pure  expansion 
should  be  the  default;  (b)  parallel  lines  could  be  matched  using  a  translation  in  any 
direction  within  90  degrees  of  the  normal  component,  but  the  default  should  be  the 
normal  component. 
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10  Simplifications  Arising  from 
an  Implicit  Formulation 

Before  deriving  closed  form  solutions  for  t^in  &nd  the  match  vector,  it  is  useful 
to  digress  and  consider  how  we  will  eventually  obtain  final  matches  for  contour 
points  given  the  computed  local  affine  transformations.  This  analysis  will  simplify 
subsequent  computations. 

It  turns  out  that  a  point’s  match  depends  only  on  the  translational  component 
of  its  local  affine  transformation.  That  is,  the  final  match  for  each  local  origin,  p^, 
is  given  by 

p'j  =Apj+t  =  A  J  + 1  =  t.  (37) 

Therefore,  we  need  only  implicitly  solve  for  the  rotation,  scaling,  and  shearing 
components  and  thereby  directly  compute  the  match  vector.  In  fact,  an  implicit 
formulation  should  be  more  efficient  and  stable.  Increased  efficiency  follows  from 
inverting  2x2  and  4x4  submatrices  of  the  6  x  6  R  matrix  instead  of  inverting 
R  itself.  Increased  stability  follows  from  analysis  of  the  elements  of  R:  since  the 
coefficients  of  t  do  not  depend  on  the  size  of  the  local  neighborhood,  while  the  co¬ 
efficients  of  A  grow  as  the  square  of  the  local  neighborhood,  uniform  random  noise 
affects  these  elements  differently.  By  separating  the  translational  components,  this 
undesirable  property  should  disappear. 

To  separate  the  components,  first  let  (16)  be  rewritten  as 


where  is  a  2  x  2  matrix  relating  the  translational  affine  coefficients,  t,  to  the 
2-dimensional  vector,  1|,  is  a  4  x  4  matrix  relating  the  rotational,  shearing, 
and  scaling  affine  coefficients,  r,  to  the  4-dimensional  vector,  U,  and  Ac  is  a  4  x  2 
coupling  matrix.  Likewise,  Q  and  h  can  be  represented  by 


n-\Qr  Qc 

^~[q^  Qt\' 


At  this  point,  it  is  advantageous  to  expand  and  simplify  several  of  these  sub¬ 
matrices.  Expanding,  we  obtain 


Qr  = 


Qe  0 
0  Qe  ’ 


Qc  =  0 
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(taking  advantage  of  the  fact  that  =  ULi  <^^yi  =  0  for  elliptical  neigh¬ 

borhoods  due  to  odd  symmetry),  where 


Qe  =  Eu;? 


«=1 


x]  Xiyi 
Xiyi  y- 


(42) 


and  /  is  the  2x2  identity  matrix.  Note  that  the  elements  of  are  second  moments 
taken  within  the  elliptical  neighborhood.  With  this  in  mind,  these  second  moments 
should  be  roughly  proportional,  by  some  factor  17,  to  the  second  moments, 

Jty,  and  Jyy,  that  were  used  to  derive  this  ndghborhood  (see  (24)).  Therefore,  a 
reasonable  estimation  is 

Qt  « 17J.  (43) 

All  of  these  simplifications  will  allow  us  to  solve  directly  for  the  default  trans¬ 
lation.  They  will  also  guide  the  subsequent  derivation  of  the  match  vector  itself. 


11  Determining  The  Default  Translation 

What  remains  to  be  seen,  before  deriving  a  closed  form  expression  for  the  match, 
is  how  the  default  pure  translation  is  determined.  Using  (38),  finding  tmin  for  the 
neighborhood  is  much  like  finding  the  best  local  affine  transformation  with  the 
rotational,  shearing,  and  scaling  components  set  to  zero  (t.e.  A  =  /,  the  2x2 
identity  matrix),  or 

r  =  r,„<„=  [  1001]^.  (44) 

Thus,  we  simply  solve  a  system  of  equations 

Rttmin  =  (1»  -  Rc^min^  (45) 

Since  the  determination  of  a  purely  translational  matching  merely  requires  the 
intersection  of  two  constraint  lines,  this  system  of  equations  is  underconstrained 
only  when  there  are  no  feature  point  matches  and  the  constraint  lines  are  parallel. 
In  this  case,  we  choose  the  solution  closest  to  the  average  normal  component  by 
finding,  using  the  general  pseudoinverse  (denoted  by  -I-),  the  smallest  translation 
satisfying  the  constraints.  Hence, 

tmin  =  Rt  (If  -  Rj Tmin  )  •  (46) 

The  pseudoinverse  formulation  used  for  this  and  all  subsequent  calculations 
is  presented  in  Appendix  B.  It  differs  from  the  conventionally  used  form  in  sev¬ 
eral  ways.  First,  to  promote  stability,  the  SVD  threshold  is  chosen  such  that  the 
maximum  allowed  condition  number,  Kmax  (introduced  previously),  mandates  the 
absolute  minimum  eigenvalue  for  a  given  matrix.  Second,  values  below  this  thresh¬ 
old  are  not  immediately  deemed  singular,  but  rather  continuously  default  to  zero 
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as  they  drop  below  the  threshold.  Without  this  latter  modification,  matching  solu¬ 
tions  along  contours  may  vary  discontinuously,  defeating  the  smoothing  properties 
of  the  Gaussian  neighborhoods  and  producing  rather  non-intuitive  results. 


12  Direct  Match  Computation 

Finally,  we  are  now  in  a  good  position  to  solve  directly  for  the  match  vector. 
Making  use  of  the  separation  of  affine  components  and  the  simplifications  made 
possible  by  this  separation,  (36)  is  equivalent  to 

min  (  \  .  . 

t.r.r..r,  V  +  TjiRrT  +  RJ.  -  1,)  -h  rl(Rtt  +  Rjr  -h)  J  ^  ^ 

where  Ft  and  F^  are  the  Lagrange  multipliers  for  the  translational  and  rotational 
parts  of  the  constraints  respectively.  Taking  the  partial  derivatives  with  respect 
to  Ft,  Fr,  t  and  r  and  setting  them  to  zero  respectively  leaves 


Rjr  +  Rtt  =  It 


t  = 


RtT  -f*  Ret  =  L 
Xt 


+  tn 


where 


Six 

r  =  Qr^Xr  +  Tmin, 
X,  =  +  flTr.),  X,  = 


(48) 

(49) 

(50) 

(51) 

(52) 


since  =  rmim  the  default  for  rotational,  shearing,  and  scaling  components. 

FinaUy,  substitution  of  (50)  and  (51)  into  (48)  and  (49)  then  allows  us  to  solve 
for  Xt  and  derive 


h-RjRf\r 

t  =  (fl,-lirRffi,)*  (  -  (Rf flf l+t, 

-  {R^RfRr  -  Rj)r„i^ 


(53) 


where 

R*  =  Q;HRrQ7^f  = 


0  J 


-VI  ]  (* 


0 


'-VI  ]) 


(54) 


(using  the  approximation  of  (43))  and 


'vv 


-J. 


*v 


~J. 


*v 


(55) 


17 


Equation  (53)  not  only  minimizes  the  overall  deviation  between  neighborhood 
matches  and  the  best  pure  translation,  but  it  also  guarantees  a  unique,  stable 
match  regardless  of  the  constraints  or  the  neighborhood.  When  Rr  and  Qr  are 
non-singular,  then  =  R^^,  and  the  default,  rm,„,  factors  out  of  the  solution. 
Otherwise,  one  must  use  the  pseudoinverse.  The  latter  case  occurs  when  the 
neighborhood  contour  points  are  collinear,  since  at  least  three  non-collinear  point 
matches  are  required  to  uniquely  define  an  affine  transformation.  When  this  situa¬ 
tion  occurs,  Qr  affects  the  solution  by  altering  the  eigenvalues  or  Rr.  It  essentially 
transforms  the  Euclidean  affine  parameter  space  to  one  in  which  finding  the  closest 
affine  solution  vector  to  the  purely  translational  affine  transformation. 


®min  — 


Ifmtn 

^min 


(56) 


yields  an  affine  transformation  that  minimizes  A. 

This  completes  the  development  of  the  matching  scheme.  Applications  will 
undoubtedly  need  to  extend  or  modify  certiun  detuls.  For  instance,  the  devel¬ 
opment  here,  though  robust  to  noise  in  the  constraint  lines,  says  nothing  about 
how  such  lines  are  determined.  It  is  important  to  note,  however,  that  the  scheme 
has  been  developed  in  a  general  context  and  is  therefore  applicable  any  problem 
requiring  that  full  correspondence  between  contours  in  two  images  be  found  given 
only  partial  matching  constraints. 


13  Biological  Plausibility 

The  scheme  developed  in  the  previous  sections  is  motivated  in  part  by  a  number 
of  psychophysical  and  neurophysiological  findings  regarding  the  recovery  of  short- 
range  motion  in  the  primate  visual  system.  Evidence  of  varying  detail  indicates 
that  the  primate  motion  system  may  employ  several  components  of  the  match¬ 
ing  scheme,  including  the  affine  assumption,  the  influence  of  feature  points,  local 
Gaussian  neighborhoods,  oriented  elliptical  neighborhoods,  multiple  spatial  scales, 
and  minimal  solution  mechanisms. 

Positive  evidence  for  the  affine  assumption  is  sketchy.  Interestingly,  however, 
neurons  have  been  discovered  in  area  MST  of  the  macaque  monkey  that  seem  to 
be  selectively  responsive  to  divergence  [34],  deformation  [34,  29],  and  rotation  [35] 
of  the  visual  field,  respectively  equivalent  to  the  scaling,  shearing,  and  rotational 
transformations  that  can  be  described  by  an  affine  transformation.  Although  the 
role  of  these  neurons  is  not  entirely  clear,  they  may  be  involved,  along  with  area 
MT  neurons  selective  to  pure  translation  [1],  in  the  computation  of  a  local  affine 
approximation  to  the  flow  field.  Psychophysical  experiments  indicating  that  hu¬ 
mans  can  more  easily  estimate  the  structure  of  a  moving  wire-frame  from  only  two 
apparent  motion  frames  when  the  projections  undergo  an  affine  transformation 
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can  also  be  taken  as  weak  evidence  for  the  visual  system's  employment  of  this 
approximation  [39]. 

Terminator  feature  points  have  been  shown  to  increase  the  perception  of  rigid 
motion  when  added  to  a  moving  linear  contour  with  a  single  inflection  point  [31], 
much  like  specially  matched  feature  points  affect  the  determination  of  an  affine 
transformation  here.  Furthermore,  the  perception  of  rigid  contour  movement  in¬ 
creases  as  the  distance  between  a  terminator  and  the  inflection  point  decreases, 
suggesting  that  the  visual  system  employs  some  smooth  local  integration  process, 
such  as  the  Gaussian  neighborhoods  proposed  here.  This  observation  for  the  most 
part  rules  out  any  tjrpe  of  globally  calculated  smoothness  assumption  [12, 13],  since 
the  smoothest  velocity  field  for  a  purely  translating  contour  is  a  pure  translation 
regardless  of  terminator  locations. 

Further  results  obtained  using  the  same  contour  stimulus  have  also  indicated 
that  moving  terminators  placed  off  the  contour  can  weakly  influence  the  percep¬ 
tion  of  motion,  an  effect  that  can  be  blocked  by  non-moving  terminators  placed 
between  the  moving  terminators  and  the  inflection  point  [31].  This  result  supports 
the  notion  that  motion  measurements  are  integrated  primarily  along  contours  by 
oriented  receptive  fields,  much  like  the  oriented  neighborhoods  employed  here.  Not 
surprisingly,  oriented  receptive  fields  have  been  measured  in  wea  MT  with  esti¬ 
mated  dimensions  15  arc  min  x  5  arc  min  in  central  vision  (see  [29]). 

Parallel  short-range  motion  analysis  by  several  spatial  scales  is  also  supported 
by  psychophysical  evidence.  Several  results  indicate  that  both  input  to  and  in¬ 
tegration  within  the  motion  system  consists  of  several  spatial  scales  [1,  41,  29]. 
Moreover,  experiments  involving  the  motion  of  random  dots  have  indicated  that 
perception  of  global  flow,  as  opposed  to  local  independent  motion,  increases  as 
the  global  distribution  of  dots  becomes  more  coherent  [45]  (see  [36]).  This  result 
suggests  that  the  visual  system  applies  an  appropriate  assumption  to  the  largest 
possible  area  of  the  image,  much  like  the  mechanism  for  choosing  the  largest  neigh¬ 
borhood  that  satisfies  matching  (motion)  constraints. 

Finally,  evidence  for  minimal  solution  mechanisms  in  the  visual  system  can  be 
gathered  from  experiments  regarding  the  perception  of  non-rigid  motion,  which  for 
the  most  part  occurs  when  the  perception  of  pattern  motion  is  similar  to  the  normal 
velocity  measurements  rather  that  the  actual  motion  field.  Perception  of  coherent 
pattern  motion  of  two  moving  sine  wave  gratings  has  been  found  to  decrease  with 
the  angle  between  their  primary  directions  [1].  Similarly,  the  perceived  rigidity 
of  a  translating  sine  wave  contour  (of  the  form  asin(QX  +  0t))  decreases  as  the 
wave  becomes  more  shallow  (the  angle  at  the  intersection  of  the  curves  with  the 
X  axis  is  less  than  15  degrees)  [30].  Given  these  results,  it  has  been  suggested 
that  the  visual  system  pays  more  attention  to  the  normal  velocity  in  such  highly 
unstable  (ambiguous)  situations  in  order  to  boost  the  signal  to  noise  ratio  [31]. 
This  suggestion  is  consistent  with  the  minimal  solution  mechanism  proposed  here. 
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which  effectively  minimizes  the  summed  squared  deviations  between  neighborhood 
matches  and  the  average  neighborhood  normal  component  when  the  best  purely 
translational  matches  are  ambiguous. 


14  Implementation  and  Results 

The  matching  scheme  was  implemented  in  C  on  a  Sun  workstation.  Five  spatial 
scales  were  employed.  The  sizes,  in  terms  of  the  variance  of  the  local  Gaussian 
neighborhood,  (Taixti  were  4,  8,  16,  32,  and  64  pixels.  These  sizes  were  chosen  so 
that  the  largest  neighborhood  was  on  the  order  of  the  size  of  the  examples,  thereby 
ensuring  that  the  largest  local  neighborhood  could  roughly  include  the  constraints 
of  the  entire  example.  Input  consisted  of  two  256  x  256  pixel  binary  contour  im¬ 
ages.  Some  were  synthetically  produced,  while  others  were  extracted  zero  crossings 
of  smoothed  and  differentiated  natural  imagery  [23].  Though  the  method  should 
be  able  to  deal  with  any  set  of  contour  images  with  partially  constrained  matches, 
the  two  images  chosen  for  each  simulation  were  relatively  aligned  to  reflect  the 
fact  that  constraint  lines  are  most  easily  derived  from  mildly  differing  imagery 
(t.e.  time  sampled  imagery  for  short-range  motion,  or  roughly  aligned  objects 
for  object  recognition).  The  “correct”  matches  between  each  pair  of  images  were 
known  for  synthetic  imagery  and  hand-picked  for  natural  imagery.  The  constraint 
line  for  each  contour  point  was  determined  by  finding  the  local  orientation  of  the 
contour,  then  taking  the  line  with  the  same  orientation  that  passed  through  the 
“correct”  match.  In  the  context  of  visual  motion,  these  constraint  lines  roughly 
mimicked  a  local  measurement  of  normal  velocity  and  allowed  comparisons  to  be 
made  with  the  results  of  previous  methods  for  recovering  optical  flow. 

To  assess  the  affect  of  noise  upon  the  matching  process,  10%  random  noise 
was  added  to  the  components  of  the  normals  describing  the  constraint  lines  and 
feature  point  matches.  Assuming  this  equal  noise  distribution,  a  value  of  0.5  was 
used  for  a,  the  accuracy  of  feature  point  matches  relative  to  the  accuracy  of  con¬ 
straint  lines.  Furthermore,  the  parameter  Kmaxt  used  both  for  selecting  the  size 
of  the  neighborhoods  and  for  setting  the  SVD  threshold  in  pseudoinverse  opera¬ 
tions,  was  experimentally  determined  by  applying  the  scheme  to  the  worst  case 
(most  ambiguous)  matching  problem,  the  matching  of  parallel  lines  (see  example 
lb  in  appendix  A).  The  average  percentage  deviation  of  matches  with  respect  to 
the  actual  normal  components  of  the  match  (the  expected  minimal  solution)  was 
determined  for  several  values  of  Kmor-  Using  the  results,  shown  in  Figure  5,  the 
tradeoff  between  stability  (low  Kmax)  and  accuracy  (high  /c^ax)  was  balanced  by 
choosing  =  74.0,  the  highest  value  for  which  the  average  percentage  error 
did  not  exceed  the  percentage  noise.  This  value  was  used  for  all  subsequent  exam¬ 
ples,  the  results  of  which  are  shown  in  Appendix  A.  Note  that  this  choice  seems 
particularly  appropriate  considering  the  large  increase  in  matching  error  as  Kmax 
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Figure  5:  Average  percent  matching  error  vs.  K^ax  for  the  matching  of  parallel 
lines  with  10%  random  noise  in  the  constraint  lines. 


increases  beyond  this  value. 

15  Discussion 

As  demonstrated  by  the  results  presented  in  Appendix  A,  the  proposed  contour 
matching  scheme  robustly  recovers  the  correspondences  of  both  synthetic  and  nat¬ 
ural  contour  imagery.  First,  unlike  most  other  methods  for  matching  contours, 
this  scheme  yields  intuitive  solutions  for  ambiguous  cases,  such  as  parallel  lines 
and  rotated  ellipses.  Moreover,  matching  solutions  continuously  approach  stable 
defaults  as  the  matching  becomes  n  ore  ambiguous.  For  example,  as  a  translated 
sine  wave  becomes  more  linear  (smaller  amplitude),  the  solution  defaults  gradually 
to  the  normal  components  of  the  match.  In  the  context  of  short-range  motion,  this 
behavior  agrees  with  psychophysical  results  indicating  the  non-rigid  perception  of 
small  amplitude  translating  sine  waves  [30]  (see  earlier  section  on  biological  plausi¬ 
bility).  A  similar  default  occurs  as  a  rotated  ellipse  becomes  more  circular,  where 
tb'>  solution  again  agrees  with  psychophysical  data  indicating  that  the  smaller  the 
aspect  ratio  of  the  rotating  ellipse,  the  less  accurate  the  final  velocity  field  and  the 
more  non-rigid  the  interpretation.  Note  that  the  smoothness  assumption  [12,  13] 
yields  similar  results,  except  near  the  two  highest  curvature  points  of  the  ellipse, 
where  the  matching  is  more  accurate.  More  attention  must  be  paid  to  human 
perception  to  determine  which  of  these  interpretations  is  more  psychophysically 
accurate. 

A  second  attribute  of  the  scheme  concerns  the  usefulness  of  specially  matched 
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feature  points.  While  feature  point  matches  obviously  aid  the  matching  process 
of  the  ambiguously  matched  parallel  lines,  the  scheme  did  not  exploit  any  feature 
point  matches  for  subsequent  examples  and  nevertheless  recovered  the  actual  cor¬ 
respondences.  This  observation  indicates  that,  while  feature  point  matches  are 
especially  important  for  the  matching  of  ambiguous  examples,  they  are  no  more 
powerful  than  redundant  constraint  lines  when  used  in  conjunction  with  an  unam¬ 
biguous  example. 

A  third  observation  is  that  analysis  at  several  neighborhood  scales  greatly  en¬ 
hances  the  scheme’s  ability  to  recover  exact  matches,  particularly  those  collective 
matches  not  modeled  precisely  by  a  global  affine  transformation.  For  example, 
the  three-dimensional  distortions  used  to  generate  the  space  curve  did  not  present 
a  problem  to  the  matching  scheme.  The  attribution  of  this  capability  to  sev¬ 
eral  neighborhood  sizes  is  not  obvious  from  the  results,  but  instead  from  sim¬ 
ulations  that  reported  the  sizes  of  the  selected  neighborhoods  at  every  contour 
point.  Matching  the  views  of  the  face,  for  example,  required  small  neighborhoods 
for  matching  contours  with  intricate  detail  (such  as  the  eyes  and  nose)  and  large 
neighborhoods  for  matching  nearly  linear  contours  (such  as  the  occluding  bound- 
My  between  the  face  and  the  background).  It  is  not  until  we  examine  the  matches 
predicted  for  the  three-dimensional  rotated  wire  frame  that  we  see  the  iimitations 
of  multiple  scales.  Though  the  overall  matching  solution  is  quite  accurate,  the 
matching  of  overlapping  contours  undergoing  separate  motion  is  highly  inaccurate 
because  information  has  in  this  case  been  integrated  over  independent  contours. 

16  Conclusion  and  Future  Work 

A  method  for  determining  matches  between  contours  in  two  images  has  been  pro¬ 
posed,  developed,  and  tested,  assuming  that  constraint  lines,  each  narrowing  down 
the  match  for  a  contour  point  in  the  first  image  to  a  line  in  the  second  image,  are 
available  for  several  contour  points  in  the  first  image.  Suggested  applications  in¬ 
clude  the  determination  of  optical  flow  in  short-range  motion  and  the  matching  of 
aligned  contour  views  in  either  alignment-based  object  recognition  or  long-range 
motion. 

To  determine  the  match  for  a  contour  point  we  find  the  best  affine  transforma¬ 
tion,  in  a  weighted  least  squares  sense,  that  satisfies  the  match  constraint  lines  and 
feature  point  matches  within  an  oriented  elliptical  neighborhood.  This  neighbor¬ 
hood  is  established  by  weighing  the  constraint  equations.  The  weight  for  a  given 
point  constraint  is  the  modulation  of  the  Gaussian  distance  to  the  local  origin, 
which  establishes  a  circularly  symmetric  local  neighborhood,  by  the  Gaussian  dis¬ 
tance  from  the  axis  of  local  orientation,  which  attempts  to  limit  the  neighborhood 
to  a  single  contour.  The  width  of  the  modulating  Gaussian  is  set  such  that  the 
axes  of  the  elliptical  neighborhood  are  proportional  to  the  local  axes  of  inertia. 
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To  determine  the  width  of  the  circularly  symmetric  Gaussian,  the  effective  size  of 
the  local  neighborhood,  we  consider  several  sizes  simultaneously  and  choose  the 
smallest  one  which  yields  a  stable,  unique  solution,  as  determined  by  the  condition 
number  of  the  constraint  matrix,  R.  In  the  event  that  this  condition  number  ex¬ 
ceeds  the  parameter  Kmax  even  at  the  largest  neighborhood,  there  is  more  than  one 
possible  solution,  from  which  we  choose  the  smallest  affine  transformation  which 
predicts  the  set  of  neighborhood  matches  which  deviate  the  least  from  the  small¬ 
est  least  squares  purely  translational  matching.  Unique  determination  of  both  the 
smallest  pure  translation  and  the  final  match  is  guaranteed  through  the  use  of  a 
modified  pseudoinverse  operation. 

This  entire  procedure  is  repeated  for  every  contour  point.  Note,  however,  that 
the  match  for  each  point  may  be  computed  in  parallel.  Furthermore,  since  the 
predicted  match  for  a  point  is  simply  given  by  the  translational  component  of  the 
local  affine  transformation,  finding  the  match  involves  explicitly  solving  for  only  the 
two  translational  components  of  the  affine  transformation.  A  closed  form  solution 
for  the  match  involves  using  a  continuous  version  of  the  pseudoinverse  to  invert  a  2 
X  2  matrix  and  a  4  x  4  matrix,  the  coefficients  of  which  are  weighted  summations 
of  local  point  constraint  parameters  that  may  be  determined  in  parallel. 

The  matching  scheme  only  relies  on  two  parameters.  The  parameter,  a,  which 
ranges  between  0  and  1 ,  indicates  the  expected  accuracy  of  feature  point  matches 
relative  to  constraint  line  information.  The  parameter  Xmox,  which  must  be  set 
according  to  how  much  noise  is  expected  in  the  constraint  lines,  places  a  bound  on 
the  condition  number  of  any  constraint  matrix. 

As  for  biological  plausibility,  more  experimentation  is  necessary  to  compare 
the  performance  of  the  scheme  wi'h  that  of  the  visual  system,  in  both  motion 
perception  and  object  recognition.  However,  the  scheme  is  motivated  by  current 
biological  evidence  for  short-range  pattern  motion  and  can,  in  many  cases,  explain 
such  evidence  better  than  other  models  can. 

Simulation  results  show  that  the  scheme  performs  well  for  most  examples,  in¬ 
cluding  noisy  synthetic  imagery  and  edges  extracted  from  natural  imagery.  Mini¬ 
mal  solutions  obtuned  when  the  recovery  is  ill-posed  are  intuitive,  and  agree  with 
psychophysical  data.  It  seems  that  the  non-rigidity  of  contour  transformations 
arises  primarily  near  ambiguous  situations,  and  incorporation  of  terminators  im¬ 
proves  the  correspondence  significantly  for  such  near- ambiguous  examples. 

Despite  the  scheme’s  performance  and  biological  plausibility,  further  research 
remains.  First,  the  precise  conditions  under  which  the  local  affine  transformation  is 
uniquely  determined  by  the  constraints  should  be  better  understood.  Currently,  it 
is  difficult  to  predict  when  the  minimal  solution  mechanism  is  necessary  to  guaran¬ 
tee  uniqueness  without  examining  specific  cases  individually.  Second,  simulations 
should  more  rigorously  test  matching  capabilities  and  limitations.  Specifically, 
experiments  should  test  the  theoretical  effectiveness  of  using  oriented  elliptical 
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neighborhoods  to  limit  the  interaction  of  independent  contours.  They  should  also 
assess  matching  performance  upon  examples  larger  than  the  largest  neighborhood. 
With  this  data,  the  optimal  number  and  sizes  of  available  neighborhoods  can  be 
suggested.  Third,  application  specific  problems  can  be  addressed.  In  recovering 
optical  flow,  it  should  be  straightforward  to  test  the  scheme  using  extracted  nor¬ 
mal  velocities  from  actual  motion  frames.  In  alignment-based  recognition,  pre-  and 
post-processing  steps  for  finding  the  constraint  lines  and  final  matches  should  be 
developed  and  tested.  Finally,  extensions  and  modifications  to  the  method  can  be 
considered.  An  extended  scheme  which  makes  use  of  available  depth  information 
to  form  ellipsoidal  neighborhoods,  where  the  three-dimensional  Gaussian  distances 
are  used  to  weight  point  constraints,  may  effectively  deal  with  the  current  inac¬ 
curate  matching  caused  by  overlapping,  independently  moving  contours.  Finally, 
prediction  mechanisms  such  as  Kalman  filtering  techniques  can  be  incorporated 
for  applications,  such  as  recovering  optical  flow,  that  are  aided  by  temporal  im¬ 
provement  of  continuous  matching  solutions  over  multiple  image  frames. 

A  Simulation  Results 

In  each  example  presented  the  first  contour  image  is  shown  in  black,  and  the 
second  contour  image  is  superimposed  as  dotted  contours.  In  column  /,  vectors 
describing  the  constraint  lines  are  shown  for  sampled  points  along  the  first  contour, 
and  special  match  vectors  are  indicated  by  small  squares  at  the  endpoints.  10% 
random  noise  has  been  added  to  these  vectors.  The  computed  and  actual  match 
vectors  for  sampled  points  are  then  shown  in  columns  II  and  III,  allowing  qualita¬ 
tive  comparison.  Quantitative  results  are  also  presented.  At  each  sampled  point, 
the  relative  error  is  computed  by  normalizing  the  distance  between  predicted  and 
actual  matches  by  the  length  of  the  actual  match  vector.  The  average  relative 
error,  computed  over  all  sampled  points,  is  reported  for  each  example.  Note  that 
predicted  matches  do  not  necessarily  lie  exactly  upon  the  second  contour,  espe¬ 
cially  for  the  matching  of  ambiguous  examples.  In  practice,  final  matches  may  be 
found  by  taking  the  point  on  the  second  contour  that  is  closest  to  the  predicted 
match. 

Rows  la  and  lb  respectively  present  the  matching  of  synthetic  parallel  lines 
with  and  without  terminator  matches.  Rows  2a  and  2b  respectively  present  the 
matching  of  translated  synthetic  large  and  small  amplitude  sine  waves.  Next,  row 
4a  presents  the  matching  of  a  synthetic  ellipse  rotated  in  the  image  plane  by  20 
degrees,  and  row  4b  presents  the  matching  of  synthetic  rotated  concentric  circles. 
In  row  4a  an  orthographic  projection  of  a  synthetic  wire-frame  is  matched  for 
purely  translational  correspondence,  and  in  row  4b  the  same  wire-frame  is  rotated 
by  20  degrees  about  each  axis  and  matched.  Row  5  presents  the  matching  of  an 
orthographic  projection  of  an  arbitrary  synthetic  3D  space  curve  rotated  by  10 
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degrees  about  each  axis,  translated,  scaled  by  a  factor  of  1.2,  and  stretched  by  a 
factor  of  1.1  along  each  axis.  Finally,  the  matching  of  the  edges  obtained  from  a 
pair  of  roughly  aligned  natural  views  of  a  Volkswagen  (borrowed  from  [5]),  a  tank, 
a  doll  face  (borrowed  from  [37]),  and  a  pair  of  scissors  (borrowed  from  [5])  are 
respectively  presented  in  rows  6  through  9. 
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B  Modified  Pseudoinverse 


This  appendix  describes  two  modifications  to  the  general  pseudoinverse.  The  gen¬ 
eral  pseudoinverse  is  defined  for  any  n  x  m  matrix  A.  First,  A  is  uniquely  diago¬ 
nalized  as 

A  =  QxAQl  (57) 

where  Qi  and  Q2  are  n  xm  and  mxn  orthonormal  matrices  respectively,  A  is  the 
mxn  matrix  of  the  singular  values  of  A, 


A  = 


0  0  •••  0 
0  •  •  •  0 


(58) 


(in  this  particular  case  for  n  <  m),  and  Ai  •  •  •  An  are  the  eigenvalues  of  This 
process  is  called  singular  value  decomposition  (SVD).  Given  this  diagonalization, 
the  general  form  of  the  pseudoinverse  is 

A+  =  QaA-^Qf  (59) 


where 

A+  _  /  l/^»i  ^i}  ^  ^ 

I  0  otherwise 


(60) 


and  t  is  a  threshold  below  which  values  are  effectively  singular.  The  higher  the 
threshold,  the  more  stable,  but  the  less  precise,  the  result.  To  regulate  stability, 
we  choose  this  threshold  such  that  the  condition  number  of  A^A  is  less  than  some 
number,  Kmax-  Thus,  our  threshold  is 


t  = 


/  Afnax 

V  ^mox 


(61) 


since  the  condition  number  is  the  ratio  of  the  maximum  to  minimum  eigenvalues. 
This  is  the  first  modification. 

The  problem  with  this  formulation  of  the  general  pseudoinverse  is  that  the 
solution  suddenly  becomes  singular  when  A  is  not  full  rank.  This  discontinuity, 
due  to  a  very  small  change  in  A,  is  quite  sharp  because  the  elements  of  A do  not 
continuously  vary  with  the  elements  of  A.  Consider  instead  that  the  elements  of 
A***  gradually  default  to  zero  using: 


A+  _  f  1/A,j  if  \ij  >  t 
(  Aij/t^  otherwise 


(62) 


Now  the  elements  of  A"*"  are  continuous  functions  of  the  elements  of  A.  This  is  the 
second  modification.  (See  [38]  for  a  more  detailed  presentation  of  SVD  and  the 
general  pseudoinverse.) 
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